Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement insert_many<IntoIterator> for SmallVec #28

Merged
merged 2 commits into from
Jan 19, 2017

Conversation

nipunn1313
Copy link
Contributor

@nipunn1313 nipunn1313 commented Aug 20, 2016

This doesn't work as is, but I wanted to put it up for feedback.

According to these docs:
https://doc.rust-lang.org/std/ptr/fn.copy_nonoverlapping.html

The copy_nonoverlapping function copies memory as intended, but the ownership of the values copied are not copied over to dest, so when the box is consumed, the SmallVec does not maintain ownership. Not sure how to do this with unsafe code.


This change is Reviewable

@SimonSapin
Copy link
Member

You can not take ownership of T values out of a &[T] slice. The & indicates that it’s a borrow. To keep using a slice, you need either to clone each value or require Copy.

Maybe it’s better to take an IntoIterator rather than a slice as it’s more general and avoids the ownership problem. (Users can choose to call .iter().cloned() if they’re starting with a slice.) But then you have to deal with size_hint being inexact or even wrong, and it kinda gets messy. See rust-lang/rust#32355

@nipunn1313 nipunn1313 changed the title Try to implement insert_slice Implement insert_slice for SmallVec Aug 20, 2016
@nipunn1313
Copy link
Contributor Author

Thank you for the feedback! I think I have a much better understanding of why it wasn't working.

I intended on adding the insert_slice to leverage memcpy, so it makes a lot of sense to restrict insert_slice to arrays whose types impl Copy. Insert_many taking an into_iter would also work, but it wouldn't leverage the memcpy advantage, so it would be purely ergonomic. Perhaps there is room for both, but I started here with insert_slice.

@nipunn1313
Copy link
Contributor Author

I spent a little more time thinking about this and realize that leveraging memcpy is the perfect use of impl specialization. I think I will adapt this diff to implement "insert_many" using an IntoIterator and create a separate PR for using specialization to get a memcpy leveraged implementation.

https://github.com/rust-lang/rfcs/blob/master/text/1210-impl-specialization.md

@SimonSapin
Copy link
Member

Unfortunately specialization is not stable yet and I’d like to keep this crate working on stable Rust.

Beyond that, there was also a very similar discussion in Rust’s standard library, about specializing Clone to memcpy where T: Copy. Some argued that doing so would be wrong since the Clone implementation is arbitrary and might have a different behavior. Some argued that such a Clone implementation would be buggy, and that std didn’t need to carter to it.

In the end, specialization was not used and &mut [T] mutable slices now have two methods: copy_from_slice which requires T: Copy and clone_from_slice which requires T: Clone.

@nipunn1313
Copy link
Contributor Author

Understood. I've adapted this diff to be completely iterator-compatible (take another look now) without using any memcpy type primitives or specialization. This diff should be safe to add as a backward compatible change in its current state, as long as we're willing to include an "insert_many" operation or something of the sort.

I'll create a new diff which will work around the lack of specialization by adding new methods (extend_slice and insert_slice) which take advantage of memcpy semantics for copy type slices.

@nipunn1313 nipunn1313 changed the title Implement insert_slice for SmallVec Implement insert_many<IntoIterator> for SmallVec Aug 23, 2016
@SimonSapin
Copy link
Member

Is there a specific need for all this? What is it? I’d rather not add lots of API surface just because we can.

@nipunn1313
Copy link
Contributor Author

We were looking to use this here. There was a spot where we needed to insert many items into a smallvec (preferably efficiently). Currently, we've rolled our own "ElasticArray" with this functionality, but would prefer to use a more supported library like this.

https://github.com/ethcore/parity/pull/1957

bors-servo pushed a commit that referenced this pull request Sep 11, 2016
Add benchmarks for push, insert, extend, and pushpop

Want to add these benchmarks so that we can more effectively detect regressions/improvements from PR's like #28 and #29.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/rust-smallvec/31)
<!-- Reviewable:end -->
@nipunn1313
Copy link
Contributor Author

Here are the benchmarks as seen on my machine for this

Nipunns-MBP:rust-smallvec nipunn$ git checkout master ; cargo bench --features benchmarks bench ; git checkout insert_slice ; cargo bench --features benchmarks bench
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
   Compiling smallvec v0.2.0 (file:///Users/nipunn/src/rust-smallvec)
    Finished release [optimized] target(s) in 2.55 secs
     Running target/release/deps/smallvec-a59689f47ae774cb

running 4 tests
test bench::bench_extend  ... bench:         152 ns/iter (+/- 18)
test bench::bench_insert  ... bench:       1,150 ns/iter (+/- 167)
test bench::bench_push    ... bench:         291 ns/iter (+/- 127)
test bench::bench_pushpop ... bench:         356 ns/iter (+/- 70)

test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured

Switched to branch 'insert_slice'
   Compiling smallvec v0.2.0 (file:///Users/nipunn/src/rust-smallvec)
    Finished release [optimized] target(s) in 2.74 secs
     Running target/release/deps/smallvec-a59689f47ae774cb

running 5 tests
test bench::bench_extend      ... bench:         166 ns/iter (+/- 25)
test bench::bench_insert      ... bench:       1,150 ns/iter (+/- 184)
test bench::bench_insert_many ... bench:         500 ns/iter (+/- 139)
test bench::bench_push        ... bench:         276 ns/iter (+/- 39)
test bench::bench_pushpop     ... bench:         366 ns/iter (+/- 124)

test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured

Looks like insert_many improves upon insert as expected.
Both insert and insert_many benchmarks are inserting 100 items to the beginning of an empty smallvec<16>. Both require a resize, but insert_many will be linear while the loop of insert will be quadratic after the resize.

@nipunn1313
Copy link
Contributor Author

This is definitely a micro-optimization sensitive piece of code, so each little change can affect benchmarks. Benchmark tests were invaluable in making sure this didn't regress!

@nipunn1313
Copy link
Contributor Author

I made some minor tweaks and noticed it changed the benchmark output for bench_insert (even though we didn't touch that function).

I disassembled the outputs and have come to the conclusion that different inlining is creating different performance characteristics. On master, the insert() function is inlined into the benchmark, but after my change, the compiler chose to make insert() a function call.

This is all pointing to somewhat bogus benchmarking because external callers of insert() shouldn't be seeing this cross-crate inlining. I think in order to get a true, sense the benchmarks need to be in a separate crate, so I will give that a try (maybe using the tests/ directory?).

Nipunns-MBP:rust-smallvec nipunn$ git checkout master ; cargo bench --features benchmarks bench ; git checkout insert_slice ; cargo bench --features benchmarks bench
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
   Compiling smallvec v0.2.0 (file:///Users/nipunn/src/rust-smallvec)
    Finished release [optimized] target(s) in 2.46 secs
     Running target/release/deps/smallvec-a59689f47ae774cb

running 4 tests
test bench::bench_extend  ... bench:         153 ns/iter (+/- 5)
test bench::bench_insert  ... bench:       1,144 ns/iter (+/- 147)
test bench::bench_push    ... bench:         280 ns/iter (+/- 43)
test bench::bench_pushpop ... bench:         354 ns/iter (+/- 14)

test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured

Switched to branch 'insert_slice'
   Compiling smallvec v0.2.0 (file:///Users/nipunn/src/rust-smallvec)
git     Finished release [optimized] target(s) in 2.63 secs
     Running target/release/deps/smallvec-a59689f47ae774cb

running 5 tests
test bench::bench_extend      ... pubench:         157 ns/iter (+/- 16)
test bench::bench_insert      ... sh bench:       1,428 ns/iter (+/- 257)
test bench::bench_insert_many ... obench:         692 ns/iter (+/- 128)
test bench::bench_push        ... bench:         276 ns/iter (+/- 34)
test bench::bench_pushpop     ... bench:         354 ns/iter (+/- 111)

test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured

In any case, I think this iteration of the code is pretty good, but I'd like to improve the benchmarks so we don't see this regression due to inlining.

@nipunn1313
Copy link
Contributor Author

Going to wait for #32 to land first since it sets up the noinline bench wrapper to get more consistent results.

@bors-servo
Copy link
Contributor

☔ The latest upstream changes (presumably #32) made this pull request unmergeable. Please resolve the merge conflicts.

Add benchmark for insert_many
@nipunn1313
Copy link
Contributor Author

nipunn1313 commented Sep 21, 2016

Some improvements

  • Rebased onto the noinline benchmark suite
  • Added tests for cases with bad size hints
  • Optimized copying to eliminate n^2 behavior of many-inserts
  • Added a benchmark for insert_many 100 elements followed by insert_many 100 elements again. Tests for O(n) performance rather than O(n^2)
Nipunns-MBP:rust-smallvec nipunn$ git checkout master ; cargo bench bench ; git checkout insert_slice ; cargo bench bench
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
   Compiling smallvec v0.2.0 (file:///Users/nipunn/src/rust-smallvec)
    Finished release [optimized] target(s) in 2.32 secs
     Running target/release/bench-ef222b4531a86f9a

running 4 tests
test bench_extend  ... bench:         146 ns/iter (+/- 23)
test bench_insert  ... bench:       1,374 ns/iter (+/- 123)
test bench_push    ... bench:         439 ns/iter (+/- 128)
test bench_pushpop ... bench:         359 ns/iter (+/- 115)

test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured

     Running target/release/deps/smallvec-a59689f47ae774cb

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured

Switched to branch 'insert_slice'
   Compiling smallvec v0.2.0 (file:///Users/nipunn/src/rust-smallvec)
    Finished release [optimized] target(s) in 2.62 secs
     Running target/release/bench-ef222b4531a86f9a

running 5 tests
test bench_extend      ... bench:         153 ns/iter (+/- 30)
test bench_insert      ... bench:       1,433 ns/iter (+/- 610)
test bench_insert_many ... bench:         497 ns/iter (+/- 180)
test bench_push        ... bench:         429 ns/iter (+/- 155)
test bench_pushpop     ... bench:         358 ns/iter (+/- 171)

test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured

     Running target/release/deps/smallvec-a59689f47ae774cb

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured

I feel good about this iteration for handling the insert-many use case.

Copy link
Member

@jdm jdm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in reviewing! I wanted to take the time to reason through this implementation carefully, and I'm glad I did!

unsafe {
let ptr = self.as_mut_ptr().offset(index as isize);
let old_len = self.len;
ptr::copy(ptr, ptr.offset(lower_size_bound as isize), old_len - index);
Copy link
Member

@jdm jdm Sep 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to assert index <= old_len or this is unsafe.

}
let num_added = self.len - old_len;
if num_added < lower_size_bound {
// Iterator provided less elements than the hint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/less/fewer/

self.reserve(lower_size_bound);

unsafe {
let ptr = self.as_mut_ptr().offset(index as isize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these casts to isize when offsetting are unsafe, since they could become negative numbers and cause us to write outside of the bounds of the array. We should either use to_isize.unwrap() and panic if that occurs, or use a conversion strategy that yields a value that will gives us worse performance but correct behaviour.

@nipunn1313
Copy link
Contributor Author

Thank you for the careful review! I'm taking another pass at the change with careful attention to integer overflow and casting cases.

@nipunn1313
Copy link
Contributor Author

At a quick readthrough, it looks like push, pop, truncate, remove, insert all suffer from the same issue if len > isize::MAX.

It looks like one might be able to manipulate that situation in a call to insert() when len == isize::MAX (then the len would get set to size::MAX + 1).

In any case, here is an updated revision with some bounds checks to insert_many.

@jdm
Copy link
Member

jdm commented Jan 19, 2017

I'm terribly sorry that I dropped the ball here!
@bors-servo: r+

@bors-servo
Copy link
Contributor

📌 Commit 1c254ff has been approved by jdm

@bors-servo
Copy link
Contributor

⌛ Testing commit 1c254ff with merge 03c016c...

bors-servo pushed a commit that referenced this pull request Jan 19, 2017
Implement insert_many<IntoIterator> for SmallVec

This doesn't work as is, but I wanted to put it up for feedback.

According to these docs:
https://doc.rust-lang.org/std/ptr/fn.copy_nonoverlapping.html

The copy_nonoverlapping function copies memory as intended, but the ownership of the values copied are not copied over to dest, so when the box is consumed, the SmallVec does not maintain ownership. Not sure how to do this with unsafe code.

<!-- Reviewable:start -->

---

This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/rust-smallvec/28)

<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

☀️ Test successful - status-travis

@bors-servo bors-servo merged commit 1c254ff into servo:master Jan 19, 2017
bors-servo pushed a commit that referenced this pull request Jan 25, 2017
Implement extend_from_slice and insert_from_slice with memmove optimization

Implement the performance optimized versions of insert_many and extend (from #28). These methods use memmove rather than a looped insert.

If we had function specialization, we could implement these without exposing new methods. Up to the maintainers whether we want to support these new methods.

<!-- Reviewable:start -->

---

This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/rust-smallvec/29)

<!-- Reviewable:end -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants